Lab: search and replace

# questions (answers below)
1. searching
	a) search for "meaning" (you can use this document)
	b) use n and N to go to the next and previous occurrences of "meaning"

2. patterns
	a) search for "good" or "bad" with a single search
	b) create a character class for finding vowels
	c) use curly braces to recreate the '*', '+' and '?' specifiers
	d) find 5 digit numbers, such as 12345 or 16845
	e) find words starting with a vowel
	f) find empty lines (0 characters)
	g) find blank lines (only whitespace allowed)
	h) find comment lines starting with #
	i) find the literal string "[/-:]"

3. patterns advanced
	a) why is \C only useful if the "ignorecase" option is enabled?
	b) what would [/-:] match?
	c) can you find what \x matches in a match using the vim help?
	d) can you find how to match a tab character in the vim help?
	e) add comma's to 6 digit numbers as in the video, but do it without grouping and back references
	f) what is the best regex for finding an email address?
		john.doe@nobody.com
		not.you@nobody.com
		unusual@192.168.0.1
		_also_valid@elsewhere.tv

4. substitute command
	a) use substitute to capitalize the word "beer"
	b) rerun the previous substitution on every occurrence of "beer" in the document
	c) turn all water into wine
	d) count how many times the word "find" occurs in this document
	e) suffix the last word on every line with "nya"
	f) replace all occurrences of "the" with "tha", but only for question 4.
	g) find patterns of the form s/// and swap the pattern with the replacement
	h) use substitute to make the first letter of every word upper case and the rest lower case
		(requires help)


# answers
1. searching
	a) /meaning
		if you don't want the result to be demeaning or meaningless,
		then you will want to add word boundaries
		/\<meaning\>
		this way, you are guaranteed to find meaning.

	b) n N

2. patterns
	a) some options
		/good\|bad
		/\vgood|bad              very magic mode "\v", to not escape
		/\<\(good\|bad\)\>       added word boundaries, to exclude partial matches
		/\v<(good|bad)>          
		/\<\(goo\|ba\)d\>        moved the shared latter d out of the grouping
		/\v<(goo|ba)d>           

	b) [aeiou]                  (the vowels can be in any order)

	c)
		? == {0,1}
		* == {} or {0,}
		+ == {1,}

	d) 
		/[0-9]\{5\}        will match 5 digits, but will match a part of larger numbers as well
		/\v[0-9]{5}        with very magic mode, for less escaping
		/\d\{5\}           using the predefined character class for digits
		/\v\d{5}
		/\<\d\{5\}\>       word boundaries will exclude larger numbers
		/\v<\d{5}>               
		

	e) key components are:
		\<                 first a word boundary, otherwise a vowel in the middle of a word matches
		[aeiou]            then we need a vowel, as in b)
		[a-z]\+            lastly, match the rest of the alphabetic characters of the word (any case)

		some alternatives
			/\<[aeiou][a-z]\+       the components above put together
			/\v<[aeiou][a-z]+       very magic mode "\v" to not escape [<+]
			/\<[aeiou]\a\+          using the character class \a instead of [a-zA-Z]
			/\v<[aeiou]\a+           
			/\<[aeiouAEIOU]\a\+     making the vowels case insensitve as well
			/\v<[aeiouAEIOU]\a+     
			/\c\<[aeiou]\a\+        \c is more convenient to make case insensitive
			/\c\v<[aeiou]\a+     
			/\<[aeiou]\c\a\+        \c can be placed elsewhere as well

	f) /^$
	g) /^\s*$

	h) some options
		/#.*                 will match hashes and anything that follows a hash '#'
		/^#.*                will only match comments starting at the beginning of a line
		/^\s*#.*             will still match if there is whitespace before the hash
		/^\s*\zs#.*          same, but whitespace is not a part of the match

	i) /[/-:] won't work, because it is a character class
		/\[\/-:\]            we can escape all magic characters
		/\[\/-:]             although vim still understands if we don't escape the closing bracket
		/\V[\/-:]            or we can disable magic


3. patterns advanced
	a) Searches are case sensitive by default.
		\C makes a search case sensitive.
		So only when the user has explicitly told vim to ignore case in searches, does it make sense.

	b) [/-:] matches '/', ':', or any digit [0-9]. It does NOT match a dash.
		Technically [/-:] means from '/' to ':', which doesn't mean anything for us humans.
		Vim, however, uses the indexes in the character set to create the range.
		If you google "ascii character set", then you will find the digits between '/' and ':'
		I wouldn't write character classes like these, but it is technically possible.

	c) ":help \x" shows that it matches a hexidecimal digit

	d) the short answer is "\t"
		if you knew this then you would do ":help \t"
		but if you already knew the answer, then you didn't need the help.
		":help tab" brings you to the help page of the tab keyboard shortcut, not what we want.
		My recommendation would be to find the pattern page first ":help pattern.txt".
		":help pattern" works as well, same document, different location.
		If you search for /tab, then you will find [:tab:] which could be used in a character class.
		The entry for [:tab:] talks about a <Tab> character, so that would be a better search term.
		/<Tab> will bring you to "\t" eventually.
		Of course, if you had done a case insensitive search (e.g. "/tab\c"),
		then you would have found it straight away.

	e) the trick is to find the spot to insert the comma, without selecting any text.
		We can achieve this using anchors.
		for the first digits: \d{1,3}\zs
		for the last digits: \ze\d{3}
		the full solution is then:

		:%s/\v\d{1,3}\zs\ze\d{3}/,/g

		Since there are zero characters between \zs & \ze no text is selected.
		So the pattern finds any location with 3 digits after it and at least one before it.
		It simply inserts a comma there.

	f) the question is a joke. Email addresses are nutoriously difficult for regular expressions.
		If you google the question you will find endlessly complex examples.
		We can make a rough version though, that is usually good enough:

		/\v[^@ \t]+[@][^@ \t]+[.][^@ \t.]+

		Let's break that down:
		first we need a couple of characters that is not an '@', space ' ', or tab '\t' => [^@ \t]+
		(both '@' and ' ' are actually possible if escaped, but sufficiently rare to ignore)
		The at symbol => [@] or \@
		Once again some characters [^@ \t]+ before the dot [.] 
		You can use \. instead, but this version makes it a little easier to see the structure.
		And finally, something should follow the dot, but not contain a dot or '@' [^@ \t.]+ 

4. substitute command
	a) some options
		:s/beer/BEER/            only matches lowercase "beer" unless "ignorecase" is on
		:s/beer/BEER/i           using a flag to make it case insensitive
		:s/beer\c/BEER/          using \c instead to make it case insensitive
		:s/beer/\U&/i            technically works as well, but I didn't discuss \U (upper case)

	b) some options: 
		:%s/beer/BEER/ig
		:%s//~/g           since we already just ran the command, all we need to do is repeat it
		:%&g               if we really want to save on keystrokes

	c) :%s/water/wine/g 	

	d) some options
        :%s/find//gn           finds all in lower case
		:%s/find//gni          case insensitive, so also finds Find
		:%s/\<find\>//gni      word boundaries, to exclude "finds"

	e) some steps to get a good solution
		           :%s/$/nya/    will add "nya" at the end of every line
		     :%s/\ze\A*$/nya/    will move "nya" before any punction.
		:%s/\a\zs\ze\A*$/nya/    will skip lines that don't contain words

	f) for this, we have to specify a range, that restricts the replace to question 4.
		You can do it with line numbers, but if I change this document I have to fix the solution.
		Instead I will create a range using 2 search patterns:
		/^4[.]          finds the start of question 4.
		/#\s*answers    stops at the start of the next section
		that gives me the range /^4[.]/,/#\s*answers/
		the complete answer: 
		:/^4[.]/,/#\s*answers/ s/the/tha/g

	g) in order to do the swap, we create capturing groups for both the pattern and the replacement
		None of the commands in this document use a '/' in the pattern or replacement.
		So that keeps things simple. Either side just contains anything but the separator: ([^/]*)
		The complete pattern looks like this: \vs/([^/]*)/([^/]*)/
		For the replacement we simply use back references to swap left with right: s/\2/\1/
		I will use a different separator '@' for the wrapping substitution command:
		:%s@s/\v([^/]*)/([^/]*)/@s/\2/\1/@g

		or, if done in two steps:
		/s\/\v([^/]*)\/([^/]*)\/
		:%s@@s/\2/\1/@g

		This expression does erroneously detect the solution to e)
		It detects the last 's' of "answers" as the start of the expression.
		Dealing with that complexity isn't really the goal of the exercise.
		But if you have to get it to work, you could use a lookahead [:%/]\zs
		:%s@[:%/]\zss/\v([^/]*)/([^/]*)/@s/\2/\1/@g
		
	h) since we don't want to add every word in the document to the expression,
		we need some way of referencing words and changing their case.
		if you checked :help \0 you might have seen \u \U \l \L \e \E

        \u next char upper case
        \U upper case until \E
        \l next char lower case
        \L lower case until \E
		\e resume original case
		\E resume original case

		With that in mind we can simply find all words (sequences of \a)
		/\a\+

		and then use \u to make the first letter upper case and \L to make all letters lower case
		:%s/\a\+/\u\L&/g